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Abstract 

A recent trend in compressed sensing is to consider non-convex optimization tecliniques for sparse recovery. 
A general class of such optimizations, called F-minimization, has become of particular interest, since its exact 
reconstruction condition (ERC) in the noiseless setting can be precisely characterized by null space property (NSP). 
However, little work has been done concerning its robust reconstruction condition (RRC) in the noisy setting. In this 
paper we look at the null space of the measurement matrix as a point on the Grassmann manifold, and then study the 
relation of the ERC and RRC sets on the Grassmannian. It is shown that the RRC set is exactly the topological interior 
of the ERC set. From this characterization, a previous result of the equivalence of ERC and RRC for Zp -minimization 
follows easily as a special case. Moreover, when F is non-decreasing, it is shown that the ERC and RRC sets are 
equivalent up to a set of measure zero. As a consequence, the probabilities of ERC and RRC are the same if the 
measurement matrix is randomly generated according to a continuous distribution. Finally, we provide several rules 
for comparing the performances of different cost functions, as applications of the above results. 

Index Terms 

Reconstruction algorithms, compressed sensing, minimization methods, robustness, null space 

I. Introduction 

Compressed Sensing is a method of recovering a sparse signal from a set of under-determined linear measure- 
ments. Ideally, the optimal reconstruction method is the Iq norm minimization method: 

min ||x||o s.t. y = Ax, (1) 

where A is an m x n measurement matrix, y £ M™ is the linear measurements, and we assume that m < n. 
It can be proved that Iq minimization method requires the least possible number of measurements; however, the 
Iq minimization method is computational intractable since it is a hard combinatorial problem. Therefore, many 
algorithms have been proposed to reduce the computational complexity. Roughly speaking, these algorithms fall 
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into two categories: 1) minimizatioii techniques, where the sparse solution is retrieved by minimizing an appropriate 
cost function [1], [2], and 2) greedy pursuits, a representative of which is the orthogonal matching pursuit (OMP) 
[3]. 

In general, the greedy algorithms often incur less computational complexity, but the minimization techniques are 
more advantageous in terms of accuracy. The most basic minimization technique is the Zi-minimization, or Basis 
Pursuit (BP) [4], [1], [5]: 

min ||x||i s.t. y = Ax, (2) 

which is a simple convex optimization and can be recast as a linear program. Recently there is trend to consider 
minimizing non-convex cost functions. Examples include: 

• Ip cost function. The Zp-minimization (0 < p < 1) [6], [7], [8], [9] considers an optimization problem similar 
to (2) but the cost function is replaced with ||x||^. 

• Approximate lo cost function [10], [11], [2], [12]. 

Although the non-convex nature of these cost functions makes it difficult to exactly solve the corresponding 
optimization problems, various practical algorithms can be adapted to these non-convex problems, including the 
iteratively re-weighted least squares minimization (IRLS) [13], [14], iterative thresholding algorithm (IT) [15], 
which are based on fixed point iteration; and the zero point attracting projection algorithm (ZAP) [2], [16], [17], 
which is based on Newton's method for solving nonlinear optimization. In general the non-convex algorithms have 
empirically outperformed BP in the various respects, because nonlinear cost functions can better promote sparsity 
than the h cost function. Thus, a detailed study of the reconstruction properties of these sparse recovery methods 
remain important. 

Most of these non-convex optimizations can be subsumed in a general category called "F-minimization" [18], 
in which the cost function satisfies some desirable properties, such as subadditivity. The precise definition of the 
class of cost functions of our interest will be given in the next section. 

Two concepts arise naturally in the compressed sensing problem: The exact recovery condition (ERC) in the 
noiseless setting and the robust recovery condition (RRC) in the noisy setting. In the Uterature, ERC typically 
requires that all sparse signals can be exactly recovered. In addition to this, RRC requires that if the measurement 
is noisy, the reconstruction error is bounded by the norm of the noise vector multiplied by a constant factor. 

While the rigorous definitions of ERC and RRC are deferred to Section II, we remark here in passing that RRC 
trivially implies ERC, because ERC can be seen as a special case of RRC where the measurement is free of noise. 
Conversely, it is not obvious whether ERC also impUes RRC, or RRC is strictly stronger than ERC. Early work 
in compressed sensing have provided sufficient conditions for ERC and RRC of the Zi -minimization, based on the 
so-called restricted isometry property (RIP) [4], and those sufficient conditions appear to be identical. However, 
analysis based on RIP generally fails to provide exact (necessary and sufficient) condition for ERC and RRC. 
Another Une of research has considered the null space property (NSP), which gives a both necessary and sufficient 
condition for ERC of the /p-minimization. In addition, [19] provided a sufficient condition, called NSP', for RRC 
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of Zp-minimization. Later Aldroubi et al proved in [18] that NSP and NSP' are in fact equivalent. Hence, we have 
that ERC and RRC are actually the same condition for Zp-minimization. 

In contrast to the special case of Zp-minimization, the robust recovery condition for the more general case of F- 
minimization has been recognized as "not easy to estabhsh" [18], merely based on the idea of NSP. The fundamental 
issue of robustness in F-minimization has remained relatively unexplored. 

The purpose of this paper is to give an exact characterization of the relationship between ERC and RRC in 
the general F-minimization problem. We first show that ERC and RRC depends only on the configuration of the 
null space of the measurement matrix (the entire entries of the matrix is of course sufficient, but not necessary, 
information). Moreover, since the nuU spaces are linear subspaces of the Euclidean space, they can be viewed as 
points on a Grassmann manifold, which has a natural topological structure, hence concepts such as open sets and 
interior are well defined for collections/sets of the nuU spaces. We denote by fl and fl^ the sets that consist of the 
null spaces satisfying ERC and RRC for the F-minimization, respectively. We show that Cl'^ is exactly the interior 
of fl (Theorem 2). Hence we can give an alternative proof of the equivalence of ERC and RRC in Zp-minimization, 
by simply showing that fl is open in this special case. We would hke to remark that this analytical framework also 
gives rise to new ideas and results, including: 

• Equivalence of ERC and RRC in probabiUty. Under some mild assumptions we show that fl and fl'^ differ by 
a set of measure zero. Building on this, we show that ERC and RRC hold true with the same probabihty if the 
measurement matrix is randomly generated according to a continuous distribution. 

• Comparison between different sparseness measures. It is interesting and valuable to know how the performances 
between different sparseness measures compare. Gribonval et al [8, Lemma 7] provided a condition when one 
spareness measure is better than another in the sense of ERC. Combining this with our result, we show that this 
condition also provides a comparison in terms of RRC. Moreover, with the concept of measure zero set on the 
Grassmannian, we are able to provide addition comparison rules which guarantee one sparse measure is better than 
the other in terms of probability of ERC/RRC. 

The organization of the paper is as follows. In Section II we present the mathematical formulation of the problem 
and a brief introduction to null space property and the Grassmann manifold. Section III studies the relationship 
between ERC and RRC: Part A gives an exact characterization of RRC set as the interior of ERC set on the 
Grassmannian; in Part B we show than the ERC and RRC sets differ by a set of measure zero. In Section IV we 
provide some rules for comparing the performance of different sparse measures. Section V compares our approach 
and definitions with similar ones in the hterature. Finally in Section VI we conclude by reviewing the results and 
pointing out possible directions for future work. 

II. Problem Setup and Key Definitions 

This section provides the mathematical formulation of the problem and the definitions of some key concepts. We 
shall use lower case bold letters for vectors, and upper case bold letters for matrices. Notation M(m, n) denotes the 
set of m X n real matrices. Throughout the paper we suppose the observation matrix is m x n, and set I := n — m, 
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unless otherwise indicated. ||x||o refers to the Iq norm^ of x, i.e., the number of non-zero elements in the vector, 
and ||x||p := \x{k)\py/P denotes the Ip norm of x. 

A. Basic Model 

Let X e M", A G M(m, n), v e be the sparse signal, the measurement matrix, and the additive noise, 
respectively. Let T := supp{5i) be the support of x. Vector x is called fc-sparse if \T\ < k. The linear measurement 
y is given by 

y = Ax + V. (3) 

We consider the problem of recovering x through an optimization. Supposing F : [0, +00) — ^ [0, +00) is a given 
function, we define the cost function 

n 

J(x):=^F(|x(fc)|). (4) 
fc=i 

With a sUght abuse of the notation, we shall also use the notations: 

Ji^T): = Y^F{\x{k)\), 

keT 

j(xTc) : = J2 F(.\^m, 

where xt G M'^', xtc G ]R"~I^I denote the restriction of x on the set T, T^, respectively. Clearly (4) is a very 
general model: For example, if one chooses F{x) = lx>o then J(x) = ||x||o; if F{x) = then J(x) = ||x||^. 

The conditions ERC and RRC are commonly formulated as follows, see for example [18]. 

Definition 1 (Exact recovery condition): In the noiseless case, the sparse signal is retrieved via the following 
optimization: 

min J(x) s.t. Ax = y. (5) 

xGR" 

We say A, J satisfy the exact recovery condition (ERC) if for any measurement y = Ax, where x is A;-sparse, 
the vector x is also the unique solution to (5). 

Definition 2 (Robust recovery condition): In the noisy measurement (v 7^ 0) case, the sparse signal is retrieved 
via the following optimization: 

min J(x) s.t. llAx - yll < e, (6) 

where e G M"*" is a constant chosen to tolerate the noise. We say that the robust recovery condition (RRC) is satisfied 
if the following holds. For any fc-sparse signal x, noise v satisfying ||v|| < e, and feasible solution x satisfying 
J(x) < J(x), we have 

||x-x||<Ce, (7) 

where C is a constant. 

'Strictly speaking, the Iq norm and lp{0 <p < 1) norm defined here do not satisfy the definition of norm in mathematics. 
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B. Null Space Property 

The null space property [20], [21], [8] is useful for the analysis of a special class of cost functions, which we 
introduce as follows: 
Definition 3 (sparseness measure): Function 

F : [0, +00) ^ [0, +00) (8) 

is called a sparseness measure if the following two conditions are satisfied: 

• F{\ ■ I) is sub-additive on E; 

• F{x) = if and only if a; = 0. 

We denote by A4 the set of all sparseness measures.^ 

In this paper we assume that the function F is a sparseness measure as in Definition 3. This is a rather loose 
assumption, so that the key optimization problems in many of the sparse recovery algorithms can be subsumed in 
our framework, including Zp-minimization and ZAP algorithm. The definition is also quite natural, since it can be 
checked that F is a sparseness measure if and only if its corresponding cost function J induces a metric on M" 
via d{x,y) := J(x-y). 

When F e M, the null space property (NSP) turns out to be equivalent with ERC: 

Lemma 1 (Null space property [8](Lemma 6)): If F £ A4, then a necessary and sufficient condition for ERC is 

J(zt) < J(ztc), Vz e ^(A) \ {0}, |T| < k. (9) 

where A/'(A) denotes the null space of A. 

It's useful to define the null space constant [8 ], especially when one wants to study ?p-minimization or to compare 
it with F-minimization: 

Definition 4 (Null space constant, NSC): Suppose F G M, q € {0,1]. Define the null space constant is defined 
as: 

Oj := sup max -^^^—-l-. (10) 

zeA^(A)\{0} \T\<k J(ZT<=) 

In the same spirit, we denote by 61^ the null space constant associated with Ip cost function. 

The null space constant is closely associated with NSP, and hence characterizes the performance of F- 
minimization. We have the following result, which is a direct consequence of Definition 4 and Lemma 1. 

Lemma 2: 

1) ^' ,7 < 1 is a necessary condition for ERC; 

2) Oj < 1 is a sufficient condition for ERC. 

In the case of /p-minimization, one can obtain the following characterization (c.f.[19]), which is more exact than 
the case of F-minimization as described in Lemma 2: 

Lemma 3: For Ip cost functions, < 1 is a both necessary and sufficient condition for ERC. 

^For our purpose, the definition of sparseness measure in this paper does not need to require that F(x)/x is non-increasing. A comparison 
with other definitions of the sparseness measure is given in Section V, Part B. 
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C. Preliminaries of the Grassmann Manifold 

In this part we briefly review some relevant properties of the Grassmann manifold. More detailed treatment of 
the subject can be found in many standard texts, such as [22], [23]. The main thrust for considering this object is 
that, the property of exact recovery of a particular measurement matrix is completely determined by its nuU space, 
from Lemma 1. Of course, J^{A.) is an I := n — m dimensional linear subspace of K" when A is of fuU rank. 

Geometrically, the Grassmann manifold Gi(R") can be conceived as the collection of aU the I dimensional 
subspaces (^-planes) of R". One can introduce a topology on G'i(R") by defining a metric on it: for arbitrary 
f, v' e G((M"), the distance between v, v' can be defined as [24]: 



(11) 



where (resp. P^') is the projection matrix onto v (resp. v'), and || • || denotes the spectral norm. The Grassmann 
manifold is then a compact metric space. 

We shaU next define the coordinates on to introduce its differential manifold structure. Let F{n,l) 

be the set of all non-degenerate (invertible) n x I matrices, and let ~ be the following equivalence relation: If 
X, Y e F{n, I), then X ~ Y means X, Y are equivalent up to a non-degenerate column transform, i.e. X, Y spans 
the same linear subspace. Hence the Grassmann manifold can be defined as a quotient space G;(M") := F{n, I)/ ^, 
for which we denote by tt : F{n,l) — > G;(]R") the associated natural projection. For any arbitrary collection of 
indices 1 < ii < ^2 < • ■ • < < let 1 < ii < ^2 < ■ • ■ < in-i < n be the remaining indices. Given an index 
set I = {11,12, ■■■ , ii}, we denote by X/ the Z x Z sub-matrix formed by the rows of X indexed by I. Define 



Ui := {X e F{n,l) : detX/ ^ 0}, Vi := n{Ui). 



(12) 



Then {Vi} constitutes an open covering of Gi(M"). For any arbitrarily chosen Y G tt ^{v), where v G Vi, the 
following matrix is ~ equivalent with Y, and is independent with the specific choice of Y which represents v: 

/ T \ 



X = Y • (Yi)-^ 



(n) 

in) 



(13) 



\ ■■■ Zi^_,j J {in-l) 

Note that in the above, we have performed a row permutation to the last matrix for clarity of display. Define 
:Vi ^ M(n - I, I), V H- Xj, then {{Vi, : 1 < h < ■ ■ ■ < k < n} forms an atlas of Gi{W). 
Concepts such as open sets and interior are well-defined once a topology on G;(M") has been unambiguously 
chosen. One might notice that there are possibly two topologies defined on G;(M"): the metric topology arising 
from the metric defined in (11), and the manifold topology (which is cormected to the standard topology on M"*' 
by aU the homeomorphisms {0/}). Unsurprisingly these two topologies agree, since standard calculations would 
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show that the metric on Uj induced from the Euclidean metric on (f)i{Ui) is topologically equivalent to the metric 
defined in (11). 

Further, since is a C°° (therefore differentiable) manifold, the concept of measure zero set can be defined 

as follows: 

Definition 5: [23, Definition 1.16] A subset ^ of a differentiable manifold has measure zero if (l){A n U) has 
Lebesgue measure zero for every chart {U, (f)). 

There is a unique (up to a scalar factor) rotational invariant measure on Gi(M"), i.e., the Haar measure. The 
requirement that a set A has zero Haar measure agrees with Definition 5. ^ The Haar measure is of practical 
importance, since it coincides with the distribution of the nuU space of A when A is a Gaussian random matrix. 
We use fi to denote the normahzed Haar measure on G;(M"), which can also be understood as a probability. 

ni. The Relationship between ERC and RRC 
A. A Topological Characterization of RRC 

We have mentioned earlier that NSP is a necessary and sufficient condition for ERC. If A G M(m, n) is in a 
general position (i.e., the rows of A e M(m, n) are linearly independent), then A is of full rank, and A/'(A) is a 
Z-dimensional subspace in M" (recall that I = n — to). Therefore almost every measurement matrix (except for the 
set of A's not in a general position, which is of Lebesgue measure zero) corresponds to an element in G/(]R"); 
and this element is sufficient to determine whether NSC, and therefore ERC, is satisfied. By Lemma 1, the set of 
null spaces such that ERC is satisfied is as follows: 

Oj :={!/ e Gi{W) : J(zt) < J(ztc), Vz G v \ {0}, |T| < k}. (14) 

If two cost functions induced from the sparseness measures F,G ^ M satisfy the following condition 

^Ja^^Jp, (15) 

then ERC for G-minimization implies ERC for F-minimization, i.e., F is better a sparseness than G in the sense 
of ERC. In the Ught of this we can describe and compare the performances of different sparseness measures in 
terms of ERC by a simple set inclusion relation Uke (15). 

'This is because the Haar measure on Gi(W^) can be associated with an exterior differential form dX and an orientation (c.f. [25, Section 
1.4]) such that ^Ji{A) = fj^dX. The integral is define as follows: given an arbitrary oriented atlas {(!/«, 0a)|a G A} and a partition of unity 
{ria[a e A} obeying: 

ria > 0, SUpp?7a C Ua, ^ Tja = 1, 

cxeA 

then the probability of A is given by 

/ ^ = / fa{<t>a^{x))r]adxi...dXml, 

where /„ e C°°(Gj(M")) is a positive function such that dX];/^ = /„ A^'j^ dxi. Since the last integral is the Lebesgue integral of a positive 
function on R"*', we deduce that A(</i(A n Ua)) = if and only if the integral vanishes. 
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In Lemma 1, the necessary and sufficient condition for exact recovery is fully characterized by the structure of 
the nuU space. Inspired by this fact we now provide a necessary and sufficient condition for robust recovery: 

Theorem 1: Consider the minimization problem in (6). The RRC holds if and only if there exists a > 0, such 
that for each z e ^(A) \ {0}, n G M", T C {1, n} satisfying ||n|| < d||z||, and \T\ < k, we have the following: 

J(zt + nr) < J^(zt<: + nyo). (16) 

Proof: See Appendix A. ■ 
We remark that RRC trivially implies ERC, as can be seen in their definitions (Letting v = in the definition 

of RRC would result in the definition of the ERC), as well as in Theorem 1 (Letting n = 0). 

From Theorem 1 it is clear that the property of robust recovery of a particular matrix is also completely determined 

by its null space. Moreover, it implies that the subset of Gi(M") that guarantees RRC is the following: 

QTj ■.={v e Gz(R") : 3d> 0,s.t. J(zt + ht) < J(ztc + nrO, Vz € \ {0}, ||n|| < dllzll,|T| < k}. (17) 

It is not immediately clear from Lemma 1 and Theorem 1 the connection between ERC and RRC. However there 
is a nice relation between these two conditions once taking a perspective from the point set topology: 
Theorem 2: With the standard topology on G;(]R"), the following relation holds. 

QTj = int{Qj). (18) 

Proof: See Appendix B. ■ 
Two questions then arise: are the conditions ERC and RRC equivalent for generic cost functions? If not, how 
much do they differ from each other? We shall first address the former question in the remainder of this part, while 
the second question will be discussed in Part B. In the special case of -minimization, these two conditions are 
indeed equivalent [18], as discussed in the introductory section. In view of Theorem 1, we can show this result 
by simply proving that the ft is an open set in the case of /p-minimization. We first note the following basic fact 
about generic continuous functions. (It is stated in a shghtly stronger and more complete manner than needed for 
obtaining our final result). 

Lemma 4: Suppose X, M are metric spaces. If / : X x M — > R is continuous, then g : X -H- R, a; 
maxygM f{x, y) is lower semi-continuous on X. Further, if M is compact, then g is also continuous. 

Proof: See Appendix C. ■ 

It then follows the following result about the null space constant 9, now conceived as a map from G;(M") to the 
real numbers: 

Corollary 1: If F is continuous, then 9j : G';(M") — )• [0,+cx)) is a lower semi-continuous function. Further, 
01^ : G;(IR") — >■ [0,+oo) is a continuous function. 

The opermess of Qi^ then follows easily, from the very definition of continuous functions: that the pre-images 
of open sets are open. 

Corollary 2: If < p < 1, then fli is open, hence flj = Qi . 
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Remark 1: The equivalence result of = 0;^ in the above is essentially 'non-topological', since it does not 
involve the concept of open sets on the Grassmann manifold. A comparison of different proof methods can be 
found in Section V, Part A. 

Proof: By Corollary 1, function is continuous with respect to v. Since O/^, is the pre-image of (— oo, 1) 
under the continuous mapping of (Lemma 3), we conclude that 0;^ is open, hence = int{Sli^) = fli^. ■ 

Next we shall show an example in which RRC is strictly stronger than ERC, i.e., fl'j ^Clj. 

Proposition 1: The function 

:= a; + 1 - e"^. (19) 

defined on [0, +oo) is a spareness measure. Suppose that x,y > 0, z = x + y, k = 1, and that the null space of 
the measurement matrix is the following one dimensional linear sub-space of 

Af:=[x,y,zf, (20) 

where the homogenous coordinates [x, y, z]"^ denotes the subspace spanned by {x, y, z)"^. Conclusion: in this setting 
ERC is satisfied, but not RRC. 

Proof: Let z = {x, y, z). Since \z\ > \x\, \y\ and F{x) + F{y) > F{z), for any T such that \T\ = 1 we have: 

J(zt) < J(ztO- (21) 
Hence NSP is satisfied, and ERC must hold. On the other hand, for any < d < 1 there exists t > such that 

- d)xt) + F{yt) < F{zt). {IT) 

Now in Theorem 1, take z = (.xt, yt, ztY" , T = {3}, and n = {—dxt, 0, 0). On the one hand we have ||n||/||z|[ < d; 
on the other hand (16) doesn't hold because of (22). Therefore RRC is not fulfilled as a result of Theorem 1. ■ 

B. Equivalence Regained: the Probabilistic Equivalence 

While strict equivalence of ERC and RRC is lost when passing from Ip cost functions to generic sparseness 
measures, as demonstrated in Proposition 1, we will show in this part that the difference is only a set of measure 
zero on the Grassmann manifold, at least for non-decreasing sparseness measures. First we take a closer look at 
Example 1. Using the subadditivity property and the Taylor expansion of F at the origin, one can explicitly write 
out: 

" (23) 



= { [a;i,a;2,a;3] : 2 max \xi\ < \xi\ > , 

I j=l,2,3 I 

i=l,2,3 J 



and 

0}= <( [a;i,a;2,a;3] :2 max Ixil < >' \xi\) . (24) 



Yj={[xi,X2,X3]:2 may. \xi\< V |xi| > . 

I z=l,2,3 — I 

( i=l,2,3 J 



We recall that /x denotes the Haar measure on G;(]R"). From (23) and (24) it is intuitively clear in this simple case 
that = ^(f^j), i.e. the set of null spaces satisfying ERC and the set of null spaces satisfying RRC differ 

at most by a set of measure zero. Recall that the Haar measure agrees with the probabiUty measure in the case 
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of i.i.d. Gaussian random entries, as described in Section II, Part C. This means that if A is a Gaussian random 
matrix, then the probability of ERC and RRC are the same, even though the former is implied by the latter. 

The general case tends to be much more compUcated. Indeed, taking an arbitrary topological measurable space, 
it is very well possible that the measure of a set is strictly greater than the measure of its interior. (Consider for 
example the set of all irrational numbers, whose Lebesgue measure is oo, but whose interior is empty.) In fact 
merely F £ ^4 does not guarantee /u(f2'') = i^{int{Q^)), as will be shown in the remark at the end of this section. 
However we can show the following result: 

Theorem 3: Suppose F e is a non-decreasing function, then Qj — int{Clj) is a measure zero set, that is, 
\ = 0.4 

Proof: See Appendix D. ■ 
Remark 2: Almost all commonly used ^-minimizations (e.g. -minimization, ZAP) satisfy the requirement of F 
being non-decreasing, hence the non-increasing assumption is a very mild one. On the other hand, we remark that 
the non-decreasing requirement is also essential for the vaUdity of Theorem 3. To see this, consider the following 
example: Define 

{0 x = 0; 

0.1 X > and x is rational; (25) 
1 a; > and x is irrational, 

and set m = 2, n = 3, fc = 1. It can then be verified that F satisfies the definition of sparseness measure in 
Definition 3. Moreover, for arbitrary Xi,X2 € M, denote by xi ~ X2 the equivalence relation that either Xi/x2 € 
Q — {0} or a;i = a;2 = holds^. Then for any v e Gi(M^), the three homogenous coordinates of z G v can 
be grouped into equivalent classes according to ~, and whether v & i}j is completely determined by how these 
coordinates are grouped. Now we say v is of type (say) (1, 1, 2) if the first two homogenous coordinates of v are 
of a same equivalence class and the third homogenous coordinate is of another equivalence class. From the null 
space property we can check that the type (1, 2, 3) is in ftj, while (1, 1, 2) is not. Since the null spaces of the type 
(1, 2, 3) is of measure 1, we have that /i(rij) ~ 1. One the other hand, since the set of one dimensional subspaces 
corresponding to the type (1, 1, 2) is dense in Gi(M"^) and also does not intersect Qj, the interior of fij must be 
vacuous, hence /i(m<(r2j)) = ^ 

A trivial observation from Theorem 3 is that the probabiUty of ERC and RRC are the same if the observation 
matrix A has i.i.d. Gaussian entries, since in this case the probability agrees with the measure n. More generally, 
suppose P is the probability measure corresponding to the distribution of the null space of A, and P is absolutely 
continuous with respect to /x,^ then P{Qj \ Oj) = 0. Then it is not counter-intuitive that this should be true if 

'^Here the notation '\' denotes the set minus. 

denotes the set of rational numbers 
*The measiue /ni is said to be absolutely continuous with respect to the measure iJ,2 if l^2{E) = impUes /Ui(E) = 0, for arbitrary 
measurable set E. 
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the entries of A are i.i.d. generated from a certain continuous distribution, which is a common practice used in 
generating the observation matrix. Nevertheless, the above speculation requires a formal justification. We formulate 
this result as a corollary, the proof of which is deferred to Appendix E. 

Corollary 3: Suppose F G ^4 is a non-decreasing function, and the distribution of the matrix A is absolutely 
continuous with respect to the Lebesgue measure on M(m, n). Then the probability of ERC and RRC are the same. 
This holds true in particular when A has i.i.d. entries drawn from a continuous distribution. 

Remark 3: Apart from the one described in Corollary 3, another popular method for the generation of A is by 
randomly selecting m rows in the nxn Fourier transform matrix [26], [27]. However in this scheme the probability 
of ERC and RRC may not agree, since the probabiUty distribution of the null space is not continuous on Gi(M"). 



IV. Comparison of Different Sparseness Measures 

In this section we provide some methods to compare the performance between two sparseness measures in terms 
of ERC or RRC, as an application of the results from the previous sections. It turns out that both the topological 
characterization of RRC and the probabilistic (measure-theoretic) viewpoint become particularly useful when passing 
from the Ip cost functions to general sparseness measures. 

The following lemma comes from the corresponding result for ERC in [8] and our interior point characterization 
of RRC: 

Lemma 5: Suppose F,G & M. If F,G are non-decreasing and F/G is non-increasing on then we have 
^Ja C flj, and n^j^ c n^j^. 

Proof: The fact that Qj^ C fij^ comes from [8, Lemma 7]. It then follows that Qj^ C ilj^ from Theorem 
2. ■ 

The set inclusions formulas in Lemma 5 means that the sparseness measure F is better than G, in the sense that 
whenever the cost function Jq guarantees ERC/RRC, so does the Jp. By letting G{x) := in this lenuna we can 
obtain the following result: 

Corollary 4: Suppose F e Ai, p G (0, 1]. If F is non-decreasing and F{x)/x^ is non-increasing on ]R+, then 
we have Q-i^ C Q and C fi^^ . 

Corollary 4 gives a condition such that Jf is better than Ip in the sense of ERC and RRC. Conversely, we shall 
show that the asymptotic of F around 0+ and -|-oo gives a sufficient condition that Ip is better than Jp in terms 
of probability. 

Theorem 4: Suppose F £ A4, p e (0, 1]. If limj._^o+ F{x)/x'p or lima;_>.oo F{x)/x^ exist and is positive, then 
^jp C O;^, and /u(f2j^) < M(^^ip)- 

Proof: See Appendix F. ■ 

We Remark that < nii^i^) in Theorem 4 cannot be replaced by the stronger set inclusion relation 

^Jf ^ ^ip' which holds for Ip cost functions but fails for general sparseness measures. Thus the measure-theoretic 
viewpoint allows us to restore a comparison criteria when extending /p-minimization to the F-minimization. 
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From the above result, we can immediately derive the relation between ZAP [2] and Zi -minimization. The typical 
form of sparseness measure used in the ZAP algorithm is the following: 

ax — c?x^ X < 1/a: 

(26) 

1 otherwise, 

where the tuning parameter a is usually chosen as the inverse of the standard deviation of the non-zero entries in 
X. Our following result says that, while ZAP performs far better than Zi -minimization in the average case, as shown 
in the numerical experiments [2], the worst case performance (requiring all sparse vectors can be constructed) of 
the two cost functions are the same: 
Corollary 5: 

niSlzAp) = K^h). (27) 

Proof: Using Corollary 4 and Theorem 4 with p = 1 one can obtain both the lower and upper bound on 
^,{^zAp) respectively. ■ 
We end this section by summarizing the relationship between the various requirements on F appeared in this 
section: 

Proposition 2: Assuming that < p < 1, F : [0, +oo) [0, -|-oo), and F(0) = 0, we have 

(1) F is concave F{t)/t is non-increasing; 

(2) F{t)/tP is non-increasing =^ F{t)/t is non-increasing; 

(3) F{t)/t is non-increasing => F is sub-additive. 

V. Comparison with Other Works 
A. The ERC/RRC Equivalence for Ip-minimization 

To the best of our knowledge, the exact characterization of robustness of /p -minimization first appeared in [19], 
where the definition of robustness is the same as in our paper. In [19] a variant of the nuU space property, called 
NSP', was proposed as a sufficient condition for the robustness of Ip minimization. The NSP' is obviously stronger 
than NSP, but the reverse situation is not innmediately clear. Later Aldroubi et al adopted the same approach in 
[18], and proved that NSP and NSP' are in fact equivalent (see also [28]). The proof method in [18] requires a 
lemma from matrix analysis [18, Lemma 2.1]. We remark that this lemma, from a sUghtly more general viewpoint, 
can be seen as a classical appUcation of the open mapping theorem in functional analysis [29, Chapter 4, Corollary 
3.2]. Thus it is established that NSP, NSP', ERC and RRC are all equivalent for Zp-minimization. 

While the NSP' approach is nice for the Ip case, it is hard to be extended to the general F-minimization problem. 
This is because NSP' consists of a homogeneous inequality, which appears to work well only for homogeneous 
cost functions such as the Ip norm. In contrast, the heart of our approach is the interior point characterization 
of RRC (Theorem 2) for the general F-minimization problem. Then our proof of the ERC/RRC equivalence for 
Zp-minimization, although involves some basic facts about topological spaces, follows almost immediately as a 
corollary. Note this application is particularly interesting since the statement of ERC/RRC equivalence does not 
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involve topology at all. Nevertheless, we emphasize that the significance of Theorem 2 is to provide a simple, 
accurate, and general characterization of the robustness of F-minimization; and the proof of ERC/RRC equivalence 
for Ip is one of its appUcations in a special setting. 

B. The Notion of Sparseness Measure 

The sparseness measure defines the class of cost functions of our interest, and is therefore of great importance. 
In general we want to consider a class wide enough to cover most applications, but also small enough to possess 
important recovery properties. Intuitively, the cost function should penalize non-zero coefficients, and not penalize 
the zero coefficients. However there are additional reasonable requirements, the precise definitions of which differ 
in the literature. For clarifications we compare these different requirements on F as follows (Recall that Ai denotes 
the set of sparseness measures defined in Definition 3): 

• F ^ A4. This is the class of functions mainly considered in our paper as well as [18]. This seems to be most 
general class of functions that can be studied by the null space property. 

• F ^ A4 and F is non-decreasing. This requirement appears in Theorem 3. As shown in the counter example 
in the remark following the theorem, the assumption that F being non-decreasing cannot be dropped. 

• F E A4, F is non-decreasing, and F{t)/t is non-increasing^. This requirement is considered in [8], [30], and it 
guarantees that the cost function Jp is better than li norm in the sense of ERC. There is also another nice property 
relating to the composition of two functions in this class [8, Lemma 7]. Finally, li norm is the only convex cost 
function whose corresponding F satisfies this definition of sparseness measure [30, Proposition 2.1]. 

VI. Conclusion 

F-minimization refers to a broad family of non-convex optimizations for sparse recovery which has outperformed 
conventional li minimization experimentally. However because of some technical difficulties, the robustness of F- 
minimization was not fuUy understood before, even though its exact recovery property has been studied by using 
the nuU space property. The novel approach of this paper is to view the collection of nuU spaces as a topological 
manifold, called the Grassmann manifold, and provide an exact characterization of the relationship between robust 
recovery condition (RRC) and exact recovery condition (ERC): the set of null spaces satisfying RRC is the interior 
of the one satisfying ERC (Theorem 2). Building on this characterization, the previous result of the equivalence 
of exact recovery and robust recovery in the Zp-minimization follows as an easy consequence. Besides some rather 
direct appUcations of the our interior characterization of RRC, such as the comparison of different sparseness 
measures, we showed another main result that if F is non-decreasing then the sets associated with ERC and RRC 
differ by a set of measure zero (Theorem 3). The practical significance of this result is that ERC and RRC will occur 
with equal probabiUty when the measurement matrix is randomly generated according to a continuous distribution. 

'The assumption of F{t)/t being non-increasing guarantees that F is sub-additive, as shown in Proposition 1. 
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Further improvements may include finding more general conditions on F than non-decreasing in order that 
Theorem 3 still holds. Studies of the robustness under perturbation in the measurement matrix may also be of 
interest. 

Appendix A 
Proof of Theorem 1 

Sufficiency: Suppose x is the recovered signal. From the constraint of the optimization we have 

||A(x - x)|| < II Ax - y|| + II Ax - y|| < 2e. (28) 
Define u := x — x; from the optimaUty of x we have 

J(ut) > J(utc). (29) 

Decompose u = z + n, such that z belongs to the null space of A. The above inequaUty is in contradiction with 
(16), hence from the assumption we must have: 

||n||>d||z||. (30) 

Therefore 

2e>||A(x-x)|| 
= l|An|| 
> a'mm||n|| 



= O'n 



l + d 

where amin is the smallest singular value of A. Thus RRC holds. 
Necessity: We will show by contradiction. Assuming that 

Vd>0,3||n|| <d\\z\\,z€M{A), 

such that J{zt + n^) > J{zt'= + nT<:), (31) 

we will show that the recovery is not robust. To do this, we will construct xi, X2 with J(x2) > J(xi), and v with 
||v|| = e, ||Axi - (Ax2 + v)|| = e; but 

Since d is arbitrary and hence unbounded from below, the constant wUl be unbounded from above. 
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For any d, choose n, z satisfying (31). Define^ u := z + n, xi = (ut)^, X2 
|v|| = e. Then ||Axi — (Ax2 + v)|| = e. Hence 

2e= ||A(xi -X2)|| 
= l|An|| 
= l|A||||n|| 



-(uTcr°, v = A(xi -X2)/2, 



Thus the relation (32) holds, as desired. 



< IIAII 



'1-d' 



u . 



Appendix B 
Proof of Theorem 2 

Lemma 6: Suppose u G G;(M"). For all z G i^\{0}, ||n|| < d\\z\\, there exists u' G G';(M") such that z + n G u' 

and d{i', v') < d. 

Proof: Since distances are preserved under a rotation, we can assume without generality that // = 
span{ei,e2, ■ ■ ■ jBi),"^ and span{z) = span{ei). We then define ly' = span(ei, 62, . . . , e;_i, z + n). There is 
a column transformation that ttansforms (ei, 62, . . . , e;_i, z + n) into 



M := 











w 



(33) 



where the vector w G M" '"""^ satisfies ||w|| = 1. Define w := (^2,^3, . . . , G M}. With some basic algebra 

we get ||w|| < d. Now the column vectors of M still spans u', and one has that 



(34) 
(35) 



1; 












P^/ = M(M^M)-iM'^ 














(36) 



Then with some Unear algebra we obtain 





( 








(P.-P.,)' = 





w w 







[ 





WW 



(37) 



Since ||w|| < d, (37) implies that \\Ft, - Pi,'|| < d. 

^For X 6 RI'^I, we denote by x'^ e K" the n-vector supported on T satisfying (x'^)^ = x. 
'{ei, 62, . . . , en} here denotes the standard basis of R". 
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The proof of Theorem 2 is now in reach. We shall first show that C int{fl). Suppose u G O'', and let 
d < 1 be one feasible parameter that appears in the definition (17). It suffices to prove that the neighbourhood 
U = {u' : d{u, u') < d/{l + d)} is a subset of fl, i.e. any i/' in this neighbourhood satisfies the condition in (14). 
This is because for any v' e.U and z' e u'\{0}, one can find z := Pj^z' G v such that ||z — z'||/||z'|| < d/{l + d), 
which imphes that ||z — z'||/||z|| < d. This combined with the fact that z e fl'" shows J(z^) < J(z^c) for every 
\T\ < k. Hence u' CCl follows from the arbitrariness of z', as desired. 

Next we have to show the converse. If u £ int{fl), then by definition there exists d> such that 

u' e n, u') < d. (38) 

Now Vz e f \ {0}, ||n|| < d||z||, there exist u' such that z + n e. u' and d{i', v') < d. Hence v' e O, meaning that 
J(zt + nr) < J(zt= + nT<=) for every \T\ < k. This implies that v e Cl'^. By the arbitrariness of u we conclude 
that int{n) C Q"". 

Appendix C 
Proof of Lemma 4 

1) The lower semi-continuity of g is obvious; indeed, it follows from the fact that g is defined as the supremum 
of a collection of continuous functions [31, P38 (c)]. 

2) To show that g is also upper semi-continuous when M is compact: We wiU prove that g is upper semi- 
continuous at an arbitrary xo & X: let yo be a point in M such that g{xo) = f{xo,yo) (Here we used the 
compactness of M). Suppose otherwise, that g is not supper semi-continuous at xo, then there exists e > such 
that: 

limsup g{x) > g{xo) + e. (39) 
This implies that we can find sequences a;„, 2/„(n > 1) such that lim„_).oo Xn — xq and the following holds: 

f{x„,y„)> g{xo) + e. (40) 
Since M is compact, we can find a subsequence yn^, {k > 1) converging to some point y* G M. Hence 

9{xo) = f{xo,yo) 

> f{xo,yo) 

= lim f{Xnk,ynk) 

> g{xo) + e, 

which is an apparent contradiction. 

Remark 4: In the above proof, the assumption that X, M are metrical spaces rather than topological spaces is 
useful only when showing the existence of the sequences a;„,y„, (n > 1). Therefore, for fuU generaUty we may 
just assume that X, M are topological spaces satisfying the first countable theorem [32]. 
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Appendix D 
Proof of Theorem 3 

Notation 1: In this section we define IIt as follows. (This is not to be confused with the definition of ilj or 

ftT := {ueGi{W): 

J(zt) < J(zT=),Vz e u \ {0}}, (41) 

hence fl = n|T|=fc ^t- 

Let P"~^(R) = G'i(R") be the n — 1 dimensional projective space (i.e., the set of one dimensional linear 
subspaces of K"), and p" : R" \ {0} P"-^(M) be the natural projection from R" \ {0} to P"-^{R). Then 
X e P"~^(R) can be expressed in homogenous coordinates as X = [xi, . . . ,Xn] if . . . ,Xn)) = X. We 

note that the function d defined in (11) can be naturally extended to the case where the linear subspaces i^, i/' are of 
different dimensions, in particular the case where one of the two subspace belongs to Gi (R) and the other belongs 
to P"^^(M). The following two observations about d, although simple in natural, are useful when deaUng with the 
distance between elements in Gi(M.) and P"~-'^(R): 

• If Vi, i = 1, 2, 3 are Unear subspaces of R" (with possibly different dimensions), then 

d{iyi, 1/2) < 1/3) + d{i/2, 1^3). (42) 

• If 1/1,1 = 1, 2, 3 are linear subspaces of R" (with possibly different dimensions) and z^i C 1/2, then 

d{i^2,'^3) < d{i^i,P3)- (43) 

Notation 2: There is a partial relation :>=t on P"-i(R), defined as follows: Jf X,Y G P"-i(R) and X, Y can be 
expressed in homogenous coordinates as [xi,. . . , Xn], [yi,. . ■ , Vn], where \xi\ >\yi\, i gT and \xi\ < \yi\, i €T, 
then X Y. 

Since F is non-decreasing, we observe the following property: 

Property 1: Suppose an Z-plane u G passes through a line X, and an Z-plane v' passes through a line X', 
with the condition X' )pT X. Then y' must also belong to 0^. 

Definition 6: [33] Suppose E is& measurable set in R^, the Lebesgue density of E' at a point x e R^ is defined 
as limr^.0 ^^\(B(^r'^^ where A denotes the Lebesgue measure. If the density exists and is equal to 1, x is said to 
have the Lebesgue density of E. 

The Lebesgue density theorem [33, Chapter 3, Corollary 1.5] claims that almost all x e .B (except for a set 
of Lebesgue measure zero) has the Lebesgue density of E. Thus this theorem essentially says that the set E is 
"robust" when taking those points that are of the Lebesgue density of E. While a point in int{E) must has the 
Lebesgue density of E, the converse is not always true. However, the idea in the proof of Theorem 3 is to show a 
converse statement as such by using the monotonicity of F. 
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Proof of Theorem 3: Note that in the Definition 5 of measure zero set, the "every chart" can be replaced by 
"a collection of charts where the coordinate neighbourhoods cover the manifold", which is an easy consequence 
of the fact that C°° functions on the Euchdean space maps measure zero sets into measure zero sets. With this in 
mind we only have to show that for any chart (Uj, (pi) and T it holds that A((?i/([f2 — int{fl)] fl Uj)) = 0. Since 

int{n) - O = int{ Q Qt) - 

\T\=k \T\=k 

= Pi int{QT) - f] 

\T\=k \T\=k 

C IJ {int{QT) - ilr), (44) 

\T\=k 

we in turn only need to prove 

X{(I)i{[Qt - intiflT)] n [//)) = (45) 

for every |T| < k. Moreover, define 

S = {i' e G;(M") : Vx G 1/ has at most I - 1 zero entries} 
= fl Ui. (46) 

/C{l,...,n} 
\I\=l 

Then obviously S" is a zero measure set. Hence we only have to prove 

X{(f>i{[QT - intinr)] n 5)) = (47) 

for every |r| < k. 

For any v d S — int{^T), there exists a sequence € S — fir, / = 1, 2, . . . , oo such that ^ v ?& I ^ oo. 
Then for each I we can find z' G i/' — {0} such that J{zx) ^ J(z^pc). Because P"^^(M) is compact, p"(z') has 
a convergent subsequence converging to an X G P"~^(M). Moreover from the observations in (42) and (43) we 
have 

d{X, v) < d{X, p"(z')) + rf(p"(z'), u) 

<d{X,p''{z'))+d{v\v) 

^ 0, (48) 

therefore X &v. 

Suppose X = (a;i, . . . , and X = /o"(x). Since v ^ S, d& most I — 1 of the entries of x can be zero. Hence 
there exists an Z-element index set 7' C {1, . . . ,n} such that {xi : i G I'} has at least one non-zero element and 
{xi : i ^ I'} has no zero element, and therefore we can choose (with much liberty) an invertible matrix B G M.{1, 1) 
such that the first colunm of B is x//. With this construction and in view of Property 1, it is obvious that the 
Lebesgue density of B o (j)i'{S — f2) at B o (j)ii{v) is at least 2""* > 0. Since (B o (^/', i///) and {<j)i, Uj) are 
C°° -compatible, the Lebesgue density of ^i{S — SI) at <?f>/(z^) is also positive, therefore the Lebesgue density of 
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^i{S n Cl) at (piiv) cannot be 1. Recalling that v is arbitrarily chosen from S — int{Cl), by the Lebesgue density 
theorem ^7 (5 fl [O — int{Sl)]) must be a zero measure set. ■ 

Appendix E 
Proof of Corollary 3 

Informally, the idea of the proof can be described as follows. Suppose iJ,{Slj \ fl'j) = 0. Let H C ^^(R") be 
the set of orthogonal complements of the Z-dimensional subspaces in Qj \ Qj. Since Gm(M") is isomorphic to 
G;(R") (recall that I := n — m), we have ^,{H) = as well^''. Notice that an m x n matrix can be described as an 
m-dimensional linear space together with m? coordinates, we can think of M(m, n) as having a similar structure 
as Gto(R"') X M"*^. However E x M"*^ has measure by the property of the product measure, therefore the set of 
mxn matrices whose row spaces he in £ has Lebesgue measure zero. Then the probabilistic equivalence of ERC 
and RRC follows from the definition of absolute continuity. 

Now the major deficiency of the above intuition is that M(m, n) is not indeed identical to the product space 
Gm(M") X M™^ Thus we need the following concept: 

Definition 7: [23], [34]: A vector bundle is a 4-tuple {E, X, w, R'') in which the base space X is aC^ manifold, 
and the following conditions are satisfied: 

(a) Each x G X is associated with a fiber, which is a fc-dimensional vector space of Ex, and we can write the 
total space E as a disjoint union: 

E=Y[Ex, (49) 

xex 

and the projection map w : E ^ X satisfies 

Tr{^) = x, y^eEx. (50) 

(b) There exists an open cover U of X, such that for each U gU, there is a corresponding family of frames 

euix) = {eu,i{x), eu,k{x)). 

Here, eu{x) is a frame of E^ for each x € U. If U,V &U and t/ n F 7^ 0, both eu{x) and ey(x) are defined on 
U nV, therefore there is a A; x fc non-degenerate matrix Auv {x) such that 

eu{x) = ev{x)Auv{x). (51) 

Moreover, the map Auv : U nV ^ GL{R'') is C . 

Remark 5: If the above conditions are met, then the space E can be parameterized to be a C" manifold. The 
C°° structure imposed on E is such that for each U € U, the map hu ■ U xR'' 7r~^(J7), {x,ai,. . . , ak) i->- 
J2i=i ceieu,i{x) is a C°° homeomorphism. See for example [34, Theorem 2.4]. 

We now construct a particular vector bundle in which the base space is G'm(IR"). The total space is denoted 
as EmiR") which consists of all pairs (i^, pi, . . . , p^) where Pi € M"*, 1 < i < m are vectors in the plane u. 

'"Here we abused the notation by denoting fj, the Haar measure both on G((IR") and on Gm(K"), since the two manifolds are isomorphic. 
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The projection tt maps (z/, pi, . . . , pm) into u. The linear space structure on the fiber attached to v is the linear 
structure of the linear transforms on u. We can choose the open cover U as the family of {J//}|/|=to in a similar 
manner as described in Section II, Part C. For an n x m matrix Y whose colunms vectors spans v, we define the 
frame at v by 

ec/,(z/)=Y-(Y,)-i. (52) 

It is obvious that this definition of the frame is independent on the specific choice of Y, and it is routine to check 
the compatibility condition (51) for intersecting sets Ui,Ujr and that A(Uj,Uji) is C°°. In this way we have 
constructed a vector bundle according to Definition 7." We summarize as follows: 

Proposition 3: (^^^(K"), Gto(R"), tt, M") is a vector bundle, thus in particular ii^„(M") is an mn-dimensional 
C°° manifold. 

Recall that measure zero sets are well defined on C°° manifolds. From Remark 5 it's easy to see the following 
fact: 

Proposition 4: The natural projection p : M(m, n), (i^, pi, . . . , p^) i— )■ [p^^, . . . , p^]* is C°°, 

therefore maps measure zero subsets of Em{R") into measure zero subsets of M(m, n). 

Returning to the proof of Corollary 3, we first obtain that fJ.{H) = 0, where H C Gm(R") is the set of orthogonal 
complements of the Z-dimensional subspaces in ilj \ j, as before. Then for each Uj, H f) Uj is a measure zero 
subset of Ui. By the property of product measure we have that {H n [//) x M™ is a measure zero subset of 
Ui X M™ . Since h in Remark 5 is a homeomorphism we obtain that Tr^^{H (lUi) is a measure zero subset 
of TT^HUi). Finally n-^{H) = \JjTt-\H DUi) is a measure zero subset of 7r-i(G™(M")) = E„i{M."). This 
combined with Proposition 4 shows that Af{A) G {ilj \ j)) only if A falls into a Lebesgue measure zero set on 
Ai{m,n), which is of probability zero if the probability distribution of A is absolutely continuous (with respect 
to the Lebesgue measure). 

Appendix F 
Proof of Theorem 4 

Since the value of dj depends on the null space of the measurement matrix, in the following it is considered as 
a function of e Gi(R"). 

Lemma 7: Let G A^, g G (0, 1]. If lima;4,o F{x)/x'^ or lima;^.oo F{x)/x'^ exists and is positive, then < 9j 
for any v &Gi{W^). 

Proof: We only prove for the case where Vmvtio F{t) /f exists and is positive, because the case where 
limi^^oo ^'(^)/^'' exists and is positive is essentially similar. By definition we only have to prove the following 
for any z & u \ {0} and T satisfying |T| < k: 

P%<^. (33) 
"Alternatively, we can construct the same bundle as the Whitney sum of m universal bundles [23, Definition 2.9, Definition 2.26]. 
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Notice that for any f G M, vector still belongs to Af{A), hence 

left side of(53) = lim 4^^^ (54) 

^ ' m j{tzTc) 

< 0J, (55) 

where (54) is because limtio F{t)/t'^ exists and is positive, and (55) is from the definition of supremum. ■ 
Lemma 8: O;^ = {u : Bi^ < 1}. 

Proof: First we prove that O;^ C {i/ -. Oi^ < 1}. This is because Lemma 1 shows that {v : < 1} is closed. 
f2;^ C {u : < 1}. On the other hand, it is obvious that {v : < 1} C O;^. The proof is complete. ■ 
Lemma 9: Given v G G;(R"), if Oi^ < Oj, then Q.j C H^. 
Proof: 

nj<^{v:ej<l}Q{y: Oi^ <l} = Tk,- (56) 



Theorem 4 then follows easily from the following lemma: 
Lemma 10: 

li{{u : 61^ < 1}) = ^Ji{{v : < 1}). (57) 

Proof: Define 

ei^,T:= sup (58) 

Obviously 6i^ = maiK\T\<k6i^,T, therefore we only have to prove that for any \T\ < fc we have fi{{u : Bi^^t < 
1}) = '■ 0iq,T < !})• Choose an arbitrary I which does not intersect with T (Since k + l <n, one can always 
find such /). Then we only have to prove: 

A({Xj G M(to,0 : 0i„T(</.7'(X/)) = 1}) = 0, (59) 

In the following we wiU write Oi^^t instead of Oi^^Ti4>J^O^i)) for simpUcity. From the continuity of the measure, 
it then suffices to prove: 

A(M n {X/ G M(m, /) : Oi^^t < 1}) = A(M n {X/ G M(m, /) : Oi^^t < 1}) (60) 

for any bounded set M C M(m, I). 

From the dilation property of the Lebesgue measure we have 

XiM^'" n {X/ G M(m, /) : 0i^,t < a}) = a''=A(M n {X/ G M(m, I) : 0i^,t < 1}), (61) 
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where M^'" is the dilation of the set M in the rows associated with T by the factor of a. Hence 

A(M n {X/ e M(m, I) : 9i^,t < 1}) -A(U,>i [M n {X/ e M(m, /) : Oi^^t < 1 - 1/0]) 

>A(Ui>i[M^'^-i/' n {X/ e M(m,0 : 0;„r < 1 - 1/0]) 
= lim A(M'^'^-^/' n {X/ e M(m,/) : ei/^T < 1 - 1/0) 
= lim (1 - 1/0'''A(M n {X/ e M(m, : 0,,t < 1}) 
=A(M n {X/ G M(m, : e,,,T < 1}), 
which proves the vahdity of (60). ■ 
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